Implementation Plan: Data Models and Documentation Generation Foundation
Branch: 001-data-models-docs-foundation | Date: 2026-02-14 | Spec: spec.md
Input: Feature specification from /specs/001-data-models-docs-foundation/spec.md
Note: This template is filled in by the /speckit.plan command. See .specify/templates/commands/plan.md for the execution workflow.
Summary
Build the foundational data models and documentation generation system for a CUI compliance Ansible framework. This feature establishes the single source of truth for all compliance data through YAML data models (control mappings, glossary, HPC tailoring, ODP values) and provides Python-based tooling for generating audience-specific documentation and validating glossary coverage. No Ansible roles are implementedβonly the structured data and generation/validation scripts that all subsequent compliance implementation specs depend on.
Technical Context
Language/Version: Python 3.9+ (per constitution tech stack) Primary Dependencies: PyYAML (YAML parsing), Jinja2 (templating for doc generation), pytest (testing), NEEDS CLARIFICATION (YAML schema validation library) Storage: File-based YAML (control_mapping.yml, terms.yml, hpc_tailoring.yml, odp_values.yml) + generated Markdown/CSV outputs Testing: pytest for unit tests, YAML validation tests, doc generation integration tests Target Platform: RHEL 9 / Rocky Linux 9 (per constitution), command-line tooling Project Type: Data models + CLI scripts (Ansible project skeleton with Python tooling) Performance Goals: Documentation generator completes all 7 outputs in <30 seconds (SC-004), NEEDS CLARIFICATION (YAML load time for 110+ controls) Constraints: Deterministic output (same YAML β same docs), CI-friendly exit codes, Excel-compatible CSV, GitHub-flavored Markdown, NEEDS CLARIFICATION (YAML schema enforcement approach) Scale/Scope: 110 NIST 800-171 Rev 2 controls + 97 Rev 3 requirements, 60+ glossary terms, 49 ODPs, 10+ HPC tailoring entries, 7 doc output types
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
Principle I: Plain Language First
β PASS - Feature directly implements glossary with plain-language explanations for all 5 audiences (PI, researcher, sysadmin, CISO, leadership). Glossary validator enforces no undefined jargon.
Principle II: Data Model as Source of Truth
β PASS - YAML files (control_mapping.yml, terms.yml, hpc_tailoring.yml, odp_values.yml) are single source; all documentation is generated, never duplicated.
Principle III: Compliance as Code
β PASS - Control mapping includes Ansible role assignment placeholders and control tagging structure for future implementation. This feature establishes the data foundation for compliance-as-code.
Principle IV: HPC-Aware
β PASS - hpc_tailoring.yml explicitly documents 10+ HPC/security conflicts with compensating controls, risk acceptance, and NIST 800-223 references.
Principle V: Multi-Framework
β PASS - Control mapping covers all 4 frameworks simultaneously (NIST 800-171 Rev 2/3, CMMC L2, 800-53 R5) with explicit "N/A" + rationale for missing mappings.
Principle VI: Audience-Aware Documentation
β PASS - Documentation generator produces 7 distinct audience-specific outputs from single YAML source (PI guide, researcher quickstart, sysadmin reference, CISO map, leadership briefing, glossary, crosswalk).
Principle VII: Idempotent and Auditable
β PASS - Control mapping includes placeholders for verify.yml/evidence.yml task files. Doc generator is deterministic (same input β same output).
Principle VIII: Prefer Established Tools
β PASS - Uses PyYAML (standard YAML library), Jinja2 (established templating), pytest (standard Python testing). No custom parsers/generators where established tools exist.
Gate Status: β ALL PRINCIPLES SATISFIED - Proceed to Phase 0 Research
Project Structure
Documentation (this feature)
specs/001-data-models-docs-foundation/
βββ plan.md # This file (implementation plan)
βββ research.md # Phase 0 output (technology decisions)
βββ data-model.md # Phase 1 output (YAML schemas)
βββ quickstart.md # Phase 1 output (usage guide)
βββ contracts/ # Phase 1 output (script interfaces, no APIs)
β βββ README.md
βββ tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
Source Code (repository root)
This is an Ansible project with Python tooling. Structure follows Ansible best practices with compliance data models:
rcd-cui/
βββ ansible.cfg # Ansible configuration
βββ inventory/ # Ansible inventory
β βββ hosts.yml
β βββ group_vars/
β βββ all.yml
β βββ management.yml
β βββ internal.yml
β βββ restricted.yml
βββ roles/ # Ansible roles (empty initially, populated in future specs)
β βββ common/
β βββ vars/
β βββ control_mapping.yml # CANONICAL DATA MODEL (110+ controls)
βββ docs/ # Documentation source and generated output
β βββ glossary/
β β βββ terms.yml # CANONICAL GLOSSARY (60+ terms)
β βββ hpc_tailoring.yml # HPC-specific control tailoring (10+ entries)
β βββ odp_values.yml # Organization-Defined Parameters (49 ODPs)
β βββ generated/ # Generated documentation (ephemeral)
β βββ pi_guide.md
β βββ researcher_quickstart.md
β βββ sysadmin_reference.md
β βββ ciso_compliance_map.md
β βββ leadership_briefing.md
β βββ glossary_full.md
β βββ crosswalk.csv
βββ scripts/ # Python automation scripts
β βββ generate_docs.py # Documentation generator
β βββ validate_glossary.py # Glossary coverage validator
β βββ models/ # Pydantic data models
β βββ __init__.py
β βββ control_mapping.py
β βββ glossary.py
β βββ hpc_tailoring.py
β βββ odp_values.py
βββ templates/ # Jinja2 templates for doc generation
β βββ pi_guide.md.j2
β βββ researcher_quickstart.md.j2
β βββ sysadmin_reference.md.j2
β βββ ciso_compliance_map.md.j2
β βββ leadership_briefing.md.j2
β βββ glossary_full.md.j2
β βββ crosswalk.csv.j2
β βββ _partials/
β βββ glossary_link.j2
β βββ control_table.j2
β βββ header.j2
βββ tests/ # Pytest tests
β βββ test_yaml_schemas.py # Validate all YAML data models
β βββ test_generate_docs.py # Doc generator integration tests
β βββ test_glossary_validator.py # Glossary validator unit tests
βββ Makefile # Build targets (docs, validate, crosswalk, clean)
βββ requirements.txt # Python dependencies (PyYAML, Pydantic, Jinja2, pytest)
βββ README.md # Project overview and usage
βββ .specify/ # Specify framework artifacts
βββ memory/
βββ constitution.md # Project constitution
Structure Decision: Ansible project structure with Python tooling. This feature establishes the data foundation (4 YAML files) and documentation generation pipeline (Python scripts + Jinja2 templates). No Ansible roles are implemented yetβthose come in future specs. The structure separates:
- Canonical Data (
roles/common/vars/,docs/glossary/,docs/*.yml) - Single source of truth, version-controlled - Generated Artifacts (
docs/generated/) - Ephemeral, regenerated from YAML sources - Tooling (
scripts/,templates/) - Python generators and validators - Tests (
tests/) - Schema validation and integration tests
This aligns with Constitution Principle II (Data Model as Source of Truth) and Principle VI (Audience-Aware Documentation).
Complexity Tracking
No constitution violations. All principles satisfied: - β Established tools (Pydantic, PyYAML, Jinja2, pytest) - β Data model as source of truth (YAML canonical, docs generated) - β Plain language first (glossary with 5-audience context) - β HPC-aware (explicit tailoring document) - β Multi-framework (4 frameworks in single data model)
No complexity justification required.